Overview

Dataset statistics

Number of variables25
Number of observations32060
Missing cells29033
Missing cells (%)3.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory25.1 MiB
Average record size in memory822.3 B

Variable types

Numeric12
Categorical13

Warnings

target is highly correlated with created_accountHigh correlation
has_married is highly correlated with marital_statusHigh correlation
created_account is highly correlated with targetHigh correlation
marital_status is highly correlated with has_marriedHigh correlation
target has 29033 (90.6%) missing values Missing
capital_gain has 29380 (91.6%) zeros Zeros
capital_loss has 30568 (95.3%) zeros Zeros
total_months_with_employer has 494 (1.5%) zeros Zeros

Reproduction

Analysis started2021-11-28 16:48:32.282302
Analysis finished2021-11-28 16:48:59.740743
Duration27.46 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

age
Real number (ℝ≥0)

Distinct73
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.56481597
Minimum17
Maximum90
Zeros0
Zeros (%)0.0%
Memory size250.6 KiB
2021-11-28T16:48:59.856871image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile19
Q128
median37
Q348
95-th percentile63
Maximum90
Range73
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.63753249
Coefficient of variation (CV)0.3536262821
Kurtosis-0.1711930329
Mean38.56481597
Median Absolute Deviation (MAD)10
Skewness0.5582934334
Sum1236388
Variance185.9822924
MonotocityNot monotonic
2021-11-28T16:49:00.015763image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36883
 
2.8%
31873
 
2.7%
34867
 
2.7%
35865
 
2.7%
23864
 
2.7%
33861
 
2.7%
28856
 
2.7%
30848
 
2.6%
37841
 
2.6%
25832
 
2.6%
Other values (63)23470
73.2%
ValueCountFrequency (%)
17393
1.2%
18541
1.7%
19703
2.2%
20745
2.3%
21711
2.2%
22749
2.3%
23864
2.7%
24786
2.5%
25832
2.6%
26771
2.4%
ValueCountFrequency (%)
9041
0.1%
883
 
< 0.1%
871
 
< 0.1%
861
 
< 0.1%
852
 
< 0.1%
8410
 
< 0.1%
836
 
< 0.1%
8211
 
< 0.1%
8119
0.1%
8022
0.1%

marital_status
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
Married-civ-spouse
14747 
Never-married
10531 
Divorced
4365 
Separated
 
1007
Widowed
 
976
Other values (2)
 
434

Length

Max length21
Median length13
Mean length14.41628197
Min length7

Characters and Unicode

Total characters462186
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNever-married
2nd rowMarried-civ-spouse
3rd rowDivorced
4th rowMarried-civ-spouse
5th rowMarried-civ-spouse
ValueCountFrequency (%)
Married-civ-spouse14747
46.0%
Never-married10531
32.8%
Divorced4365
 
13.6%
Separated1007
 
3.1%
Widowed976
 
3.0%
Married-spouse-absent411
 
1.3%
Married-AF-spouse23
 
0.1%
2021-11-28T16:49:00.316556image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
2021-11-28T16:49:00.416023image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
married-civ-spouse14747
46.0%
never-married10531
32.8%
divorced4365
 
13.6%
separated1007
 
3.1%
widowed976
 
3.0%
married-spouse-absent411
 
1.3%
married-af-spouse23
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e69721
15.1%
r67327
14.6%
i45800
9.9%
-40893
8.8%
d33036
7.1%
s30773
 
6.7%
v29643
 
6.4%
a28137
 
6.1%
o20522
 
4.4%
c19112
 
4.1%
Other values (14)77222
16.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter389187
84.2%
Dash Punctuation40893
 
8.8%
Uppercase Letter32106
 
6.9%

Most frequent character per category

ValueCountFrequency (%)
e69721
17.9%
r67327
17.3%
i45800
11.8%
d33036
8.5%
s30773
7.9%
v29643
7.6%
a28137
7.2%
o20522
 
5.3%
c19112
 
4.9%
p16188
 
4.2%
Other values (6)28928
7.4%
ValueCountFrequency (%)
M15181
47.3%
N10531
32.8%
D4365
 
13.6%
S1007
 
3.1%
W976
 
3.0%
A23
 
0.1%
F23
 
0.1%
ValueCountFrequency (%)
-40893
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin421293
91.2%
Common40893
 
8.8%

Most frequent character per script

ValueCountFrequency (%)
e69721
16.5%
r67327
16.0%
i45800
10.9%
d33036
7.8%
s30773
7.3%
v29643
7.0%
a28137
6.7%
o20522
 
4.9%
c19112
 
4.5%
p16188
 
3.8%
Other values (13)61034
14.5%
ValueCountFrequency (%)
-40893
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII462186
100.0%

Most frequent character per block

ValueCountFrequency (%)
e69721
15.1%
r67327
14.6%
i45800
9.9%
-40893
8.8%
d33036
7.1%
s30773
 
6.7%
v29643
 
6.4%
a28137
 
6.1%
o20522
 
4.4%
c19112
 
4.1%
Other values (14)77222
16.7%

education
Categorical

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
HS-grad
10347 
Some-college
7190 
Bachelors
5278 
Masters
1693 
Assoc-voc
1357 
Other values (11)
6195 

Length

Max length12
Median length7
Mean length8.435776669
Min length3

Characters and Unicode

Total characters270451
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBachelors
2nd rowBachelors
3rd rowHS-grad
4th row11th
5th rowBachelors
ValueCountFrequency (%)
HS-grad10347
32.3%
Some-college7190
22.4%
Bachelors5278
16.5%
Masters1693
 
5.3%
Assoc-voc1357
 
4.2%
11th1159
 
3.6%
Assoc-acdm1050
 
3.3%
10th919
 
2.9%
7th-8th634
 
2.0%
Prof-school567
 
1.8%
Other values (6)1866
 
5.8%
2021-11-28T16:49:00.677730image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hs-grad10347
32.3%
some-college7190
22.4%
bachelors5278
16.5%
masters1693
 
5.3%
assoc-voc1357
 
4.2%
11th1159
 
3.6%
assoc-acdm1050
 
3.3%
10th919
 
2.9%
7th-8th634
 
2.0%
prof-school567
 
1.8%
Other values (6)1866
 
5.8%

Most occurring characters

ValueCountFrequency (%)
e28991
10.7%
o26023
 
9.6%
-21638
 
8.0%
l20273
 
7.5%
a18770
 
6.9%
r18335
 
6.8%
c18299
 
6.8%
S17537
 
6.5%
g17537
 
6.5%
s14258
 
5.3%
Other values (21)68790
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter202782
75.0%
Uppercase Letter38279
 
14.2%
Dash Punctuation21638
 
8.0%
Decimal Number7752
 
2.9%

Most frequent character per category

ValueCountFrequency (%)
e28991
14.3%
o26023
12.8%
l20273
10.0%
a18770
9.3%
r18335
9.0%
c18299
9.0%
g17537
8.6%
s14258
7.0%
d11397
 
5.6%
h10983
 
5.4%
Other values (4)17916
8.8%
ValueCountFrequency (%)
13821
49.3%
0919
 
11.9%
7634
 
8.2%
8634
 
8.2%
9504
 
6.5%
2419
 
5.4%
5328
 
4.2%
6328
 
4.2%
4165
 
2.1%
ValueCountFrequency (%)
S17537
45.8%
H10347
27.0%
B5278
 
13.8%
A2407
 
6.3%
M1693
 
4.4%
P615
 
1.6%
D402
 
1.1%
ValueCountFrequency (%)
-21638
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin241061
89.1%
Common29390
 
10.9%

Most frequent character per script

ValueCountFrequency (%)
e28991
12.0%
o26023
10.8%
l20273
8.4%
a18770
 
7.8%
r18335
 
7.6%
c18299
 
7.6%
S17537
 
7.3%
g17537
 
7.3%
s14258
 
5.9%
d11397
 
4.7%
Other values (11)49641
20.6%
ValueCountFrequency (%)
-21638
73.6%
13821
 
13.0%
0919
 
3.1%
7634
 
2.2%
8634
 
2.2%
9504
 
1.7%
2419
 
1.4%
5328
 
1.1%
6328
 
1.1%
4165
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII270451
100.0%

Most frequent character per block

ValueCountFrequency (%)
e28991
10.7%
o26023
 
9.6%
-21638
 
8.0%
l20273
 
7.5%
a18770
 
6.9%
r18335
 
6.8%
c18299
 
6.8%
S17537
 
6.5%
g17537
 
6.5%
s14258
 
5.3%
Other values (21)68790
25.4%

occupation_level
Real number (ℝ≥0)

Distinct20
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.757673113
Minimum1
Maximum20
Zeros0
Zeros (%)0.0%
Memory size250.6 KiB
2021-11-28T16:49:00.797921image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median8
Q310
95-th percentile14
Maximum20
Range19
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.859709422
Coefficient of variation (CV)0.4975344238
Kurtosis-0.4598044946
Mean7.757673113
Median Absolute Deviation (MAD)3
Skewness0.2736765035
Sum248711
Variance14.89735682
MonotocityNot monotonic
2021-11-28T16:49:00.923661image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
83687
11.5%
63485
10.9%
103172
9.9%
42698
8.4%
72388
 
7.4%
122224
 
6.9%
92136
 
6.7%
52008
 
6.3%
21827
 
5.7%
31777
 
5.5%
Other values (10)6658
20.8%
ValueCountFrequency (%)
11208
 
3.8%
21827
5.7%
31777
5.5%
42698
8.4%
52008
6.3%
63485
10.9%
72388
7.4%
83687
11.5%
92136
6.7%
103172
9.9%
ValueCountFrequency (%)
2031
 
0.1%
1956
 
0.2%
18207
 
0.6%
17179
 
0.6%
16417
 
1.3%
15440
 
1.4%
141083
3.4%
131319
4.1%
122224
6.9%
111718
5.4%

education_num
Real number (ℝ≥0)

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.20761073
Minimum1
Maximum21
Zeros0
Zeros (%)0.0%
Memory size250.6 KiB
2021-11-28T16:49:01.056262image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q112
median13
Q316
95-th percentile18
Maximum21
Range20
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.353796886
Coefficient of variation (CV)0.2539291137
Kurtosis0.7716332024
Mean13.20761073
Median Absolute Deviation (MAD)1
Skewness-0.364423994
Sum423436
Variance11.24795355
MonotocityNot monotonic
2021-11-28T16:49:01.174227image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
1210347
32.3%
137190
22.4%
175278
16.5%
181693
 
5.3%
141357
 
4.2%
91159
 
3.6%
161050
 
3.3%
8919
 
2.9%
5634
 
2.0%
20567
 
1.8%
Other values (6)1866
 
5.8%
ValueCountFrequency (%)
148
 
0.1%
3165
 
0.5%
4328
 
1.0%
5634
 
2.0%
6504
 
1.6%
8919
 
2.9%
91159
 
3.6%
10419
 
1.3%
1210347
32.3%
137190
22.4%
ValueCountFrequency (%)
21402
 
1.3%
20567
 
1.8%
181693
 
5.3%
175278
16.5%
161050
 
3.3%
141357
 
4.2%
137190
22.4%
1210347
32.3%
10419
 
1.3%
91159
 
3.6%

familiarity_FB
Real number (ℝ≥0)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.29033063
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size250.6 KiB
2021-11-28T16:49:01.304484image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q38
95-th percentile9
Maximum10
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.673795294
Coefficient of variation (CV)0.5054117561
Kurtosis-1.163045096
Mean5.29033063
Median Absolute Deviation (MAD)2
Skewness0.0116450518
Sum169608
Variance7.149181275
MonotocityNot monotonic
2021-11-28T16:49:01.416335image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
53547
11.1%
23504
10.9%
73494
10.9%
63493
10.9%
83486
10.9%
93467
10.8%
43461
10.8%
33432
10.7%
12838
8.9%
101338
 
4.2%
ValueCountFrequency (%)
12838
8.9%
23504
10.9%
33432
10.7%
43461
10.8%
53547
11.1%
63493
10.9%
73494
10.9%
83486
10.9%
93467
10.8%
101338
 
4.2%
ValueCountFrequency (%)
101338
 
4.2%
93467
10.8%
83486
10.9%
73494
10.9%
63493
10.9%
53547
11.1%
43461
10.8%
33432
10.7%
23504
10.9%
12838
8.9%

view_FB
Real number (ℝ≥0)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.170929507
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size250.6 KiB
2021-11-28T16:49:01.532769image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q37
95-th percentile9
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.550474931
Coefficient of variation (CV)0.4932333591
Kurtosis-1.137469448
Mean5.170929507
Median Absolute Deviation (MAD)2
Skewness0.01187618404
Sum165780
Variance6.504922371
MonotocityNot monotonic
2021-11-28T16:49:01.639247image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
63777
11.8%
43739
11.7%
33725
11.6%
83663
11.4%
73634
11.3%
53593
11.2%
23528
11.0%
93179
9.9%
12623
8.2%
10599
 
1.9%
ValueCountFrequency (%)
12623
8.2%
23528
11.0%
33725
11.6%
43739
11.7%
53593
11.2%
63777
11.8%
73634
11.3%
83663
11.4%
93179
9.9%
10599
 
1.9%
ValueCountFrequency (%)
10599
 
1.9%
93179
9.9%
83663
11.4%
73634
11.3%
63777
11.8%
53593
11.2%
43739
11.7%
33725
11.6%
23528
11.0%
12623
8.2%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
0
18438 
1
13622 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters32060
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row0
5th row1
ValueCountFrequency (%)
018438
57.5%
113622
42.5%
2021-11-28T16:49:01.875235image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
2021-11-28T16:49:01.959450image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
018438
57.5%
113622
42.5%

Most occurring characters

ValueCountFrequency (%)
018438
57.5%
113622
42.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number32060
100.0%

Most frequent character per category

ValueCountFrequency (%)
018438
57.5%
113622
42.5%

Most occurring scripts

ValueCountFrequency (%)
Common32060
100.0%

Most frequent character per script

ValueCountFrequency (%)
018438
57.5%
113622
42.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII32060
100.0%

Most frequent character per block

ValueCountFrequency (%)
018438
57.5%
113622
42.5%

created_account
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
unknown
29033 
No
 
2787
Yes
 
240

Length

Max length7
Median length7
Mean length6.535402371
Min length2

Characters and Unicode

Total characters209525
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo
ValueCountFrequency (%)
unknown29033
90.6%
No2787
 
8.7%
Yes240
 
0.7%
2021-11-28T16:49:02.174367image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
2021-11-28T16:49:02.258743image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
unknown29033
90.6%
no2787
 
8.7%
yes240
 
0.7%

Most occurring characters

ValueCountFrequency (%)
n87099
41.6%
o31820
 
15.2%
u29033
 
13.9%
k29033
 
13.9%
w29033
 
13.9%
N2787
 
1.3%
Y240
 
0.1%
e240
 
0.1%
s240
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter206498
98.6%
Uppercase Letter3027
 
1.4%

Most frequent character per category

ValueCountFrequency (%)
n87099
42.2%
o31820
 
15.4%
u29033
 
14.1%
k29033
 
14.1%
w29033
 
14.1%
e240
 
0.1%
s240
 
0.1%
ValueCountFrequency (%)
N2787
92.1%
Y240
 
7.9%

Most occurring scripts

ValueCountFrequency (%)
Latin209525
100.0%

Most frequent character per script

ValueCountFrequency (%)
n87099
41.6%
o31820
 
15.2%
u29033
 
13.9%
k29033
 
13.9%
w29033
 
13.9%
N2787
 
1.3%
Y240
 
0.1%
e240
 
0.1%
s240
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII209525
100.0%

Most frequent character per block

ValueCountFrequency (%)
n87099
41.6%
o31820
 
15.2%
u29033
 
13.9%
k29033
 
13.9%
w29033
 
13.9%
N2787
 
1.3%
Y240
 
0.1%
e240
 
0.1%
s240
 
0.1%

has_married
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
1
17164 
0
14896 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters32060
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row1
5th row1
ValueCountFrequency (%)
117164
53.5%
014896
46.5%
2021-11-28T16:49:02.455943image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
2021-11-28T16:49:02.537646image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
117164
53.5%
014896
46.5%

Most occurring characters

ValueCountFrequency (%)
117164
53.5%
014896
46.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number32060
100.0%

Most frequent character per category

ValueCountFrequency (%)
117164
53.5%
014896
46.5%

Most occurring scripts

ValueCountFrequency (%)
Common32060
100.0%

Most frequent character per script

ValueCountFrequency (%)
117164
53.5%
014896
46.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII32060
100.0%

Most frequent character per block

ValueCountFrequency (%)
117164
53.5%
014896
46.5%

education_order
Real number (ℝ≥0)

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.851528384
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Memory size250.6 KiB
2021-11-28T16:49:02.611897image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q19
median10
Q311
95-th percentile15
Maximum16
Range15
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.411522885
Coefficient of variation (CV)0.2447866758
Kurtosis1.603895409
Mean9.851528384
Median Absolute Deviation (MAD)1
Skewness-0.1831421012
Sum315840
Variance5.815442625
MonotocityNot monotonic
2021-11-28T16:49:02.724581image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
910347
32.3%
107190
22.4%
115278
16.5%
121693
 
5.3%
141357
 
4.2%
71159
 
3.6%
151050
 
3.3%
6919
 
2.9%
4634
 
2.0%
16567
 
1.8%
Other values (6)1866
 
5.8%
ValueCountFrequency (%)
148
 
0.1%
2165
 
0.5%
3328
 
1.0%
4634
 
2.0%
5504
 
1.6%
6919
 
2.9%
71159
 
3.6%
8419
 
1.3%
910347
32.3%
107190
22.4%
ValueCountFrequency (%)
16567
 
1.8%
151050
 
3.3%
141357
 
4.2%
13402
 
1.3%
121693
 
5.3%
115278
16.5%
107190
22.4%
910347
32.3%
8419
 
1.3%
71159
 
3.6%

job_title_top_10
Categorical

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
Other
27900 
Accountant, chartered
 
484
Engineer, manufacturing
 
481
Amenity horticulturist
 
462
Tutor
 
456
Other values (6)
 
2277

Length

Max length33
Median length5
Mean length7.105427324
Min length5

Characters and Unicode

Total characters227800
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOther
2nd rowOther
3rd rowOther
4th rowOther
5th rowOther
ValueCountFrequency (%)
Other27900
87.0%
Accountant, chartered484
 
1.5%
Engineer, manufacturing481
 
1.5%
Amenity horticulturist462
 
1.4%
Tutor456
 
1.4%
Education officer, community443
 
1.4%
Environmental health practitioner436
 
1.4%
Event organiser421
 
1.3%
Conservator, museum/gallery329
 
1.0%
Clinical psychologist325
 
1.0%
2021-11-28T16:49:02.989088image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
other27900
76.1%
chartered484
 
1.3%
accountant484
 
1.3%
manufacturing481
 
1.3%
engineer481
 
1.3%
amenity462
 
1.3%
horticulturist462
 
1.3%
tutor456
 
1.2%
community443
 
1.2%
officer443
 
1.2%
Other values (12)4547
 
12.4%

Most occurring characters

ValueCountFrequency (%)
t36165
15.9%
r34790
15.3%
e34518
15.2%
h30366
13.3%
O27900
12.2%
n7803
 
3.4%
i7027
 
3.1%
a5731
 
2.5%
o5655
 
2.5%
c5456
 
2.4%
Other values (17)32389
14.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter188768
82.9%
Uppercase Letter32383
 
14.2%
Space Separator4583
 
2.0%
Other Punctuation2066
 
0.9%

Most frequent character per category

ValueCountFrequency (%)
t36165
19.2%
r34790
18.4%
e34518
18.3%
h30366
16.1%
n7803
 
4.1%
i7027
 
3.7%
a5731
 
3.0%
o5655
 
3.0%
c5456
 
2.9%
u4370
 
2.3%
Other values (9)16887
8.9%
ValueCountFrequency (%)
O27900
86.2%
E1781
 
5.5%
A1269
 
3.9%
T779
 
2.4%
C654
 
2.0%
ValueCountFrequency (%)
,1737
84.1%
/329
 
15.9%
ValueCountFrequency (%)
4583
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin221151
97.1%
Common6649
 
2.9%

Most frequent character per script

ValueCountFrequency (%)
t36165
16.4%
r34790
15.7%
e34518
15.6%
h30366
13.7%
O27900
12.6%
n7803
 
3.5%
i7027
 
3.2%
a5731
 
2.6%
o5655
 
2.6%
c5456
 
2.5%
Other values (14)25740
11.6%
ValueCountFrequency (%)
4583
68.9%
,1737
 
26.1%
/329
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII227800
100.0%

Most frequent character per block

ValueCountFrequency (%)
t36165
15.9%
r34790
15.3%
e34518
15.2%
h30366
13.3%
O27900
12.2%
n7803
 
3.4%
i7027
 
3.1%
a5731
 
2.5%
o5655
 
2.5%
c5456
 
2.4%
Other values (17)32389
14.2%
Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
Other
30182 
smith.com
 
379
jones.com
 
309
williams.com
 
206
brown.com
 
176
Other values (6)
 
808

Length

Max length12
Median length5
Mean length5.28147224
Min length5

Characters and Unicode

Total characters169324
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowjones.com
2nd rowOther
3rd rowOther
4th rowOther
5th rowOther
ValueCountFrequency (%)
Other30182
94.1%
smith.com379
 
1.2%
jones.com309
 
1.0%
williams.com206
 
0.6%
brown.com176
 
0.5%
davies.com163
 
0.5%
taylor.com163
 
0.5%
evans.com137
 
0.4%
wilson.com122
 
0.4%
johnson.com112
 
0.3%
2021-11-28T16:49:03.257248image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
other30182
94.1%
smith.com379
 
1.2%
jones.com309
 
1.0%
williams.com206
 
0.6%
brown.com176
 
0.5%
davies.com163
 
0.5%
taylor.com163
 
0.5%
evans.com137
 
0.4%
wilson.com122
 
0.4%
johnson.com112
 
0.3%

Most occurring characters

ValueCountFrequency (%)
e30902
18.3%
t30835
18.2%
r30743
18.2%
h30673
18.1%
O30182
17.8%
o2983
 
1.8%
m2463
 
1.5%
.1878
 
1.1%
c1878
 
1.1%
s1539
 
0.9%
Other values (10)5248
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter137264
81.1%
Uppercase Letter30182
 
17.8%
Other Punctuation1878
 
1.1%

Most frequent character per category

ValueCountFrequency (%)
e30902
22.5%
t30835
22.5%
r30743
22.4%
h30673
22.3%
o2983
 
2.2%
m2463
 
1.8%
c1878
 
1.4%
s1539
 
1.1%
i1076
 
0.8%
n968
 
0.7%
Other values (8)3204
 
2.3%
ValueCountFrequency (%)
.1878
100.0%
ValueCountFrequency (%)
O30182
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin167446
98.9%
Common1878
 
1.1%

Most frequent character per script

ValueCountFrequency (%)
e30902
18.5%
t30835
18.4%
r30743
18.4%
h30673
18.3%
O30182
18.0%
o2983
 
1.8%
m2463
 
1.5%
c1878
 
1.1%
s1539
 
0.9%
i1076
 
0.6%
Other values (9)4172
 
2.5%
ValueCountFrequency (%)
.1878
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII169324
100.0%

Most frequent character per block

ValueCountFrequency (%)
e30902
18.3%
t30835
18.2%
r30743
18.2%
h30673
18.1%
O30182
17.8%
o2983
 
1.8%
m2463
 
1.5%
.1878
 
1.1%
c1878
 
1.1%
s1539
 
0.9%
Other values (10)5248
 
3.1%

hours_per_week
Real number (ℝ≥0)

Distinct94
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.4334685
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size250.6 KiB
2021-11-28T16:49:03.410046image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile18
Q140
median40
Q345
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)5

Descriptive statistics

Standard deviation12.33371899
Coefficient of variation (CV)0.3050373725
Kurtosis2.919402385
Mean40.4334685
Median Absolute Deviation (MAD)3
Skewness0.2268609529
Sum1296297
Variance152.1206241
MonotocityNot monotonic
2021-11-28T16:49:03.588150image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4015004
46.8%
502779
 
8.7%
451788
 
5.6%
601453
 
4.5%
351277
 
4.0%
201201
 
3.7%
301135
 
3.5%
55681
 
2.1%
25663
 
2.1%
48508
 
1.6%
Other values (84)5571
 
17.4%
ValueCountFrequency (%)
119
 
0.1%
232
 
0.1%
337
 
0.1%
453
 
0.2%
558
 
0.2%
664
 
0.2%
726
 
0.1%
8142
0.4%
918
 
0.1%
10272
0.8%
ValueCountFrequency (%)
9983
0.3%
9811
 
< 0.1%
972
 
< 0.1%
965
 
< 0.1%
952
 
< 0.1%
941
 
< 0.1%
921
 
< 0.1%
913
 
< 0.1%
9029
 
0.1%
891
 
< 0.1%

capital_gain
Real number (ℝ≥0)

ZEROS

Distinct119
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1076.781815
Minimum0
Maximum99999
Zeros29380
Zeros (%)91.6%
Memory size250.6 KiB
2021-11-28T16:49:04.032284image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5013
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7373.301479
Coefficient of variation (CV)6.847535289
Kurtosis155.2741732
Mean1076.781815
Median Absolute Deviation (MAD)0
Skewness11.97046873
Sum34521625
Variance54365574.7
MonotocityNot monotonic
2021-11-28T16:49:04.171269image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
029380
91.6%
15024342
 
1.1%
7688282
 
0.9%
7298241
 
0.8%
99999156
 
0.5%
310396
 
0.3%
517896
 
0.3%
438670
 
0.2%
501369
 
0.2%
861455
 
0.2%
Other values (109)1273
 
4.0%
ValueCountFrequency (%)
029380
91.6%
1146
 
< 0.1%
4012
 
< 0.1%
59434
 
0.1%
9148
 
< 0.1%
9915
 
< 0.1%
105524
 
0.1%
10863
 
< 0.1%
11111
 
< 0.1%
11518
 
< 0.1%
ValueCountFrequency (%)
99999156
0.5%
413102
 
< 0.1%
340955
 
< 0.1%
2782833
 
0.1%
2523611
 
< 0.1%
251244
 
< 0.1%
220401
 
< 0.1%
2005136
 
0.1%
184812
 
< 0.1%
158315
 
< 0.1%

capital_loss
Real number (ℝ≥0)

ZEROS

Distinct91
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.09937617
Minimum0
Maximum4356
Zeros30568
Zeros (%)95.3%
Memory size250.6 KiB
2021-11-28T16:49:04.327267image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4356
Range4356
Interquartile range (IQR)0

Descriptive statistics

Standard deviation402.5526524
Coefficient of variation (CV)4.621762751
Kurtosis20.46029417
Mean87.09937617
Median Absolute Deviation (MAD)0
Skewness4.602228901
Sum2792406
Variance162048.638
MonotocityNot monotonic
2021-11-28T16:49:04.480534image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
030568
95.3%
1902198
 
0.6%
1977164
 
0.5%
1887155
 
0.5%
184851
 
0.2%
148550
 
0.2%
241548
 
0.1%
160245
 
0.1%
174042
 
0.1%
159040
 
0.1%
Other values (81)699
 
2.2%
ValueCountFrequency (%)
030568
95.3%
1551
 
< 0.1%
2134
 
< 0.1%
3233
 
< 0.1%
4193
 
< 0.1%
62512
 
< 0.1%
6533
 
< 0.1%
8102
 
< 0.1%
8805
 
< 0.1%
9742
 
< 0.1%
ValueCountFrequency (%)
43563
 
< 0.1%
39002
 
< 0.1%
37702
 
< 0.1%
36832
 
< 0.1%
30042
 
< 0.1%
282410
< 0.1%
27542
 
< 0.1%
26035
< 0.1%
255911
< 0.1%
25474
 
< 0.1%

native_country
Categorical

Distinct40
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 MiB
United Kingdom
28726 
Scotland
 
646
?
 
571
Poland
 
253
Germany
 
132
Other values (35)
 
1732

Length

Max length26
Median length14
Mean length13.16777916
Min length1

Characters and Unicode

Total characters422159
Distinct characters46
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowUnited Kingdom
2nd rowUnited Kingdom
3rd rowUnited Kingdom
4th rowUnited Kingdom
5th rowSweden
ValueCountFrequency (%)
United Kingdom28726
89.6%
Scotland646
 
2.0%
?571
 
1.8%
Poland253
 
0.8%
Germany132
 
0.4%
Canada119
 
0.4%
Bulgaria114
 
0.4%
Wales105
 
0.3%
India100
 
0.3%
Sweden95
 
0.3%
Other values (30)1199
 
3.7%
2021-11-28T16:49:04.823588image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
kingdom28726
47.3%
united28726
47.3%
scotland646
 
1.1%
571
 
0.9%
poland253
 
0.4%
germany132
 
0.2%
canada119
 
0.2%
bulgaria114
 
0.2%
wales105
 
0.2%
india100
 
0.2%
Other values (31)1294
 
2.1%

Most occurring characters

ValueCountFrequency (%)
n59564
14.1%
d58974
14.0%
i58267
13.8%
o29935
7.1%
t29759
7.0%
e29628
7.0%
m29225
6.9%
g29078
6.9%
U28754
6.8%
28726
6.8%
Other values (36)40249
9.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter332439
78.7%
Uppercase Letter60333
 
14.3%
Space Separator28726
 
6.8%
Other Punctuation590
 
0.1%
Dash Punctuation43
 
< 0.1%
Open Punctuation14
 
< 0.1%
Close Punctuation14
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
U28754
47.7%
K28726
47.6%
S769
 
1.3%
P319
 
0.5%
C268
 
0.4%
I250
 
0.4%
G238
 
0.4%
J139
 
0.2%
E118
 
0.2%
N114
 
0.2%
Other values (10)638
 
1.1%
ValueCountFrequency (%)
n59564
17.9%
d58974
17.7%
i58267
17.5%
o29935
9.0%
t29759
9.0%
e29628
8.9%
m29225
8.8%
g29078
8.7%
a3562
 
1.1%
l1587
 
0.5%
Other values (10)2860
 
0.9%
ValueCountFrequency (%)
?571
96.8%
&19
 
3.2%
ValueCountFrequency (%)
28726
100.0%
ValueCountFrequency (%)
-43
100.0%
ValueCountFrequency (%)
(14
100.0%
ValueCountFrequency (%)
)14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin392772
93.0%
Common29387
 
7.0%

Most frequent character per script

ValueCountFrequency (%)
n59564
15.2%
d58974
15.0%
i58267
14.8%
o29935
7.6%
t29759
7.6%
e29628
7.5%
m29225
7.4%
g29078
7.4%
U28754
7.3%
K28726
7.3%
Other values (30)10862
 
2.8%
ValueCountFrequency (%)
28726
97.8%
?571
 
1.9%
-43
 
0.1%
&19
 
0.1%
(14
 
< 0.1%
)14
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII422159
100.0%

Most frequent character per block

ValueCountFrequency (%)
n59564
14.1%
d58974
14.0%
i58267
13.8%
o29935
7.1%
t29759
7.0%
e29628
7.0%
m29225
6.9%
g29078
6.9%
U28754
6.8%
28726
6.8%
Other values (36)40249
9.5%

demographic_characteristic
Real number (ℝ≥0)

Distinct21423
Distinct (%)66.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean189843.6625
Minimum12285
Maximum1484705
Zeros0
Zeros (%)0.0%
Memory size250.6 KiB
2021-11-28T16:49:04.987186image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum12285
5-th percentile39409.85
Q1117789
median178449
Q3237065
95-th percentile379778.05
Maximum1484705
Range1472420
Interquartile range (IQR)119276

Descriptive statistics

Standard deviation105680.6841
Coefficient of variation (CV)0.5566721729
Kurtosis6.262694948
Mean189843.6625
Median Absolute Deviation (MAD)59948.5
Skewness1.452524423
Sum6086387820
Variance1.1168407 × 1010
MonotocityNot monotonic
2021-11-28T16:49:05.159483image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20348813
 
< 0.1%
12301113
 
< 0.1%
16419012
 
< 0.1%
14899512
 
< 0.1%
11336412
 
< 0.1%
12398311
 
< 0.1%
11148311
 
< 0.1%
15565911
 
< 0.1%
12013111
 
< 0.1%
12656911
 
< 0.1%
Other values (21413)31943
99.6%
ValueCountFrequency (%)
122851
 
< 0.1%
137691
 
< 0.1%
148781
 
< 0.1%
188271
 
< 0.1%
192141
 
< 0.1%
193025
< 0.1%
193952
 
< 0.1%
194101
 
< 0.1%
194911
 
< 0.1%
195201
 
< 0.1%
ValueCountFrequency (%)
14847051
< 0.1%
14554351
< 0.1%
13661201
< 0.1%
12683391
< 0.1%
12265831
< 0.1%
11846221
< 0.1%
11613631
< 0.1%
11256131
< 0.1%
10974531
< 0.1%
10855151
< 0.1%

town_adj
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
Edinburgh
19228 
Swindon
5170 
Other
3735 
Leeds
 
1510
Oxford
 
1393

Length

Max length9
Median length9
Mean length7.828852152
Min length5

Characters and Unicode

Total characters250993
Distinct characters21
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEdinburgh
2nd rowLeeds
3rd rowEdinburgh
4th rowEdinburgh
5th rowSwindon
ValueCountFrequency (%)
Edinburgh19228
60.0%
Swindon5170
 
16.1%
Other3735
 
11.7%
Leeds1510
 
4.7%
Oxford1393
 
4.3%
Bristol1024
 
3.2%
2021-11-28T16:49:05.477848image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
2021-11-28T16:49:05.586334image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
edinburgh19228
60.0%
swindon5170
 
16.1%
other3735
 
11.7%
leeds1510
 
4.7%
oxford1393
 
4.3%
bristol1024
 
3.2%

Most occurring characters

ValueCountFrequency (%)
n29568
11.8%
d27301
10.9%
i25422
10.1%
r25380
10.1%
h22963
9.1%
E19228
7.7%
b19228
7.7%
u19228
7.7%
g19228
7.7%
o7587
 
3.0%
Other values (11)35860
14.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter218933
87.2%
Uppercase Letter32060
 
12.8%

Most frequent character per category

ValueCountFrequency (%)
n29568
13.5%
d27301
12.5%
i25422
11.6%
r25380
11.6%
h22963
10.5%
b19228
8.8%
u19228
8.8%
g19228
8.8%
o7587
 
3.5%
e6755
 
3.1%
Other values (6)16273
7.4%
ValueCountFrequency (%)
E19228
60.0%
S5170
 
16.1%
O5128
 
16.0%
L1510
 
4.7%
B1024
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
Latin250993
100.0%

Most frequent character per script

ValueCountFrequency (%)
n29568
11.8%
d27301
10.9%
i25422
10.1%
r25380
10.1%
h22963
9.1%
E19228
7.7%
b19228
7.7%
u19228
7.7%
g19228
7.7%
o7587
 
3.0%
Other values (11)35860
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII250993
100.0%

Most frequent character per block

ValueCountFrequency (%)
n29568
11.8%
d27301
10.9%
i25422
10.1%
r25380
10.1%
h22963
9.1%
E19228
7.7%
b19228
7.7%
u19228
7.7%
g19228
7.7%
o7587
 
3.0%
Other values (11)35860
14.3%

paye_adj
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
Other
26591 
NW384000
3256 
BR442000
 
1414
EE913000
 
799

Length

Max length8
Median length5
Mean length5.511759201
Min length5

Characters and Unicode

Total characters176707
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOther
2nd rowOther
3rd rowOther
4th rowOther
5th rowBR442000
ValueCountFrequency (%)
Other26591
82.9%
NW3840003256
 
10.2%
BR4420001414
 
4.4%
EE913000799
 
2.5%
2021-11-28T16:49:05.824468image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
2021-11-28T16:49:05.913410image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
other26591
82.9%
nw3840003256
 
10.2%
br4420001414
 
4.4%
ee913000799
 
2.5%

Most occurring characters

ValueCountFrequency (%)
O26591
15.0%
t26591
15.0%
h26591
15.0%
e26591
15.0%
r26591
15.0%
016407
9.3%
46084
 
3.4%
34055
 
2.3%
N3256
 
1.8%
W3256
 
1.8%
Other values (7)10694
6.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter106364
60.2%
Uppercase Letter37529
 
21.2%
Decimal Number32814
 
18.6%

Most frequent character per category

ValueCountFrequency (%)
016407
50.0%
46084
 
18.5%
34055
 
12.4%
83256
 
9.9%
21414
 
4.3%
9799
 
2.4%
1799
 
2.4%
ValueCountFrequency (%)
O26591
70.9%
N3256
 
8.7%
W3256
 
8.7%
E1598
 
4.3%
B1414
 
3.8%
R1414
 
3.8%
ValueCountFrequency (%)
t26591
25.0%
h26591
25.0%
e26591
25.0%
r26591
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin143893
81.4%
Common32814
 
18.6%

Most frequent character per script

ValueCountFrequency (%)
O26591
18.5%
t26591
18.5%
h26591
18.5%
e26591
18.5%
r26591
18.5%
N3256
 
2.3%
W3256
 
2.3%
E1598
 
1.1%
B1414
 
1.0%
R1414
 
1.0%
ValueCountFrequency (%)
016407
50.0%
46084
 
18.5%
34055
 
12.4%
83256
 
9.9%
21414
 
4.3%
9799
 
2.4%
1799
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII176707
100.0%

Most frequent character per block

ValueCountFrequency (%)
O26591
15.0%
t26591
15.0%
h26591
15.0%
e26591
15.0%
r26591
15.0%
016407
9.3%
46084
 
3.4%
34055
 
2.3%
N3256
 
1.8%
W3256
 
1.8%
Other values (7)10694
6.1%

annual_salary
Real number (ℝ≥0)

Distinct12677
Distinct (%)39.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean200506481.3
Minimum36
Maximum6655018723
Zeros0
Zeros (%)0.0%
Memory size250.6 KiB
2021-11-28T16:49:06.035032image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum36
5-th percentile15404.925
Q119212
median24062
Q336843
95-th percentile82273854
Maximum6655018723
Range6655018687
Interquartile range (IQR)17631

Descriptive statistics

Standard deviation1086622660
Coefficient of variation (CV)5.419389203
Kurtosis29.75904943
Mean200506481.3
Median Absolute Deviation (MAD)6277
Skewness5.587776499
Sum6.428237789 × 1012
Variance1.180748805 × 1018
MonotocityNot monotonic
2021-11-28T16:49:06.193989image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6655018723799
 
2.5%
8227385499
 
0.3%
1825237
 
0.1%
1903234
 
0.1%
1752434
 
0.1%
1950033
 
0.1%
2012432
 
0.1%
1955232
 
0.1%
1981232
 
0.1%
1882429
 
0.1%
Other values (12667)30899
96.4%
ValueCountFrequency (%)
361
< 0.1%
2031
< 0.1%
2601
< 0.1%
2861
< 0.1%
2931
< 0.1%
3621
< 0.1%
4531
< 0.1%
4701
< 0.1%
5131
< 0.1%
5181
< 0.1%
ValueCountFrequency (%)
6655018723799
2.5%
66312189181
 
< 0.1%
66278074401
 
< 0.1%
66235378951
 
< 0.1%
66054160871
 
< 0.1%
65954187061
 
< 0.1%
65725309841
 
< 0.1%
65353817431
 
< 0.1%
65267264261
 
< 0.1%
65223565581
 
< 0.1%
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
£ yearly
12816 
£. per month
6380 
£ - range
4975 
£. pw
4697 
foreign_ccy
3162 

Length

Max length12
Median length8
Mean length8.958983157
Min length4

Characters and Unicode

Total characters287225
Distinct characters24
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row£ yearly
2nd row£ yearly
3rd row£. pw
4th row£ yearly
5th row£. per month
ValueCountFrequency (%)
£ yearly12816
40.0%
£. per month6380
19.9%
£ - range4975
 
15.5%
£. pw4697
 
14.7%
foreign_ccy3162
 
9.9%
.BSD30
 
0.1%
2021-11-28T16:49:06.484498image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
2021-11-28T16:49:06.586103image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
£28868
39.9%
yearly12816
17.7%
month6380
 
8.8%
per6380
 
8.8%
4975
 
6.9%
range4975
 
6.9%
pw4697
 
6.5%
foreign_ccy3162
 
4.4%
bsd30
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
45198
15.7%
£28868
10.1%
y28794
10.0%
e27333
9.5%
r27333
9.5%
a17791
 
6.2%
n14517
 
5.1%
l12816
 
4.5%
.11107
 
3.9%
p11077
 
3.9%
Other values (14)62391
21.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter193825
67.5%
Space Separator45198
 
15.7%
Currency Symbol28868
 
10.1%
Other Punctuation11107
 
3.9%
Dash Punctuation4975
 
1.7%
Connector Punctuation3162
 
1.1%
Uppercase Letter90
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
y28794
14.9%
e27333
14.1%
r27333
14.1%
a17791
9.2%
n14517
7.5%
l12816
6.6%
p11077
 
5.7%
o9542
 
4.9%
g8137
 
4.2%
m6380
 
3.3%
Other values (6)30105
15.5%
ValueCountFrequency (%)
B30
33.3%
S30
33.3%
D30
33.3%
ValueCountFrequency (%)
£28868
100.0%
ValueCountFrequency (%)
45198
100.0%
ValueCountFrequency (%)
.11107
100.0%
ValueCountFrequency (%)
-4975
100.0%
ValueCountFrequency (%)
_3162
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin193915
67.5%
Common93310
32.5%

Most frequent character per script

ValueCountFrequency (%)
y28794
14.8%
e27333
14.1%
r27333
14.1%
a17791
9.2%
n14517
7.5%
l12816
6.6%
p11077
 
5.7%
o9542
 
4.9%
g8137
 
4.2%
m6380
 
3.3%
Other values (9)30195
15.6%
ValueCountFrequency (%)
45198
48.4%
£28868
30.9%
.11107
 
11.9%
-4975
 
5.3%
_3162
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII258357
89.9%
None28868
 
10.1%

Most frequent character per block

ValueCountFrequency (%)
£28868
100.0%
ValueCountFrequency (%)
45198
17.5%
y28794
11.1%
e27333
10.6%
r27333
10.6%
a17791
 
6.9%
n14517
 
5.6%
l12816
 
5.0%
.11107
 
4.3%
p11077
 
4.3%
o9542
 
3.7%
Other values (13)52849
20.5%

total_months_with_employer
Real number (ℝ≥0)

ZEROS

Distinct487
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean67.49435434
Minimum0
Maximum695
Zeros494
Zeros (%)1.5%
Memory size250.6 KiB
2021-11-28T16:49:06.715795image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q115
median38
Q392
95-th percentile230
Maximum695
Range695
Interquartile range (IQR)77

Descriptive statistics

Standard deviation77.30166006
Coefficient of variation (CV)1.145305571
Kurtosis5.460396295
Mean67.49435434
Median Absolute Deviation (MAD)29
Skewness2.089231048
Sum2163869
Variance5975.546648
MonotocityNot monotonic
2021-11-28T16:49:06.878609image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1543
 
1.7%
11536
 
1.7%
3533
 
1.7%
2523
 
1.6%
10521
 
1.6%
9518
 
1.6%
8513
 
1.6%
7510
 
1.6%
22510
 
1.6%
6502
 
1.6%
Other values (477)26851
83.8%
ValueCountFrequency (%)
0494
1.5%
1543
1.7%
2523
1.6%
3533
1.7%
4486
1.5%
5489
1.5%
6502
1.6%
7510
1.6%
8513
1.6%
9518
1.6%
ValueCountFrequency (%)
6951
< 0.1%
6911
< 0.1%
6441
< 0.1%
6371
< 0.1%
6181
< 0.1%
6171
< 0.1%
6121
< 0.1%
5891
< 0.1%
5821
< 0.1%
5641
< 0.1%

workclass_adj
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
private
22342 
public
4287 
self-emp
3605 
unknown
 
1808
wo-pay
 
13

Length

Max length8
Median length7
Mean length6.978009981
Min length5

Characters and Unicode

Total characters223715
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpublic
2nd rowself-emp
3rd rowprivate
4th rowprivate
5th rowprivate
ValueCountFrequency (%)
private22342
69.7%
public4287
 
13.4%
self-emp3605
 
11.2%
unknown1808
 
5.6%
wo-pay13
 
< 0.1%
never5
 
< 0.1%
2021-11-28T16:49:07.183307image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
2021-11-28T16:49:07.289552image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
private22342
69.7%
public4287
 
13.4%
self-emp3605
 
11.2%
unknown1808
 
5.6%
wo-pay13
 
< 0.1%
never5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
p30247
13.5%
e29562
13.2%
i26629
11.9%
a22355
10.0%
r22347
10.0%
v22347
10.0%
t22342
10.0%
l7892
 
3.5%
u6095
 
2.7%
n5429
 
2.4%
Other values (10)28470
12.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter220097
98.4%
Dash Punctuation3618
 
1.6%

Most frequent character per category

ValueCountFrequency (%)
p30247
13.7%
e29562
13.4%
i26629
12.1%
a22355
10.2%
r22347
10.2%
v22347
10.2%
t22342
10.2%
l7892
 
3.6%
u6095
 
2.8%
n5429
 
2.5%
Other values (9)24852
11.3%
ValueCountFrequency (%)
-3618
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin220097
98.4%
Common3618
 
1.6%

Most frequent character per script

ValueCountFrequency (%)
p30247
13.7%
e29562
13.4%
i26629
12.1%
a22355
10.2%
r22347
10.2%
v22347
10.2%
t22342
10.2%
l7892
 
3.6%
u6095
 
2.8%
n5429
 
2.5%
Other values (9)24852
11.3%
ValueCountFrequency (%)
-3618
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII223715
100.0%

Most frequent character per block

ValueCountFrequency (%)
p30247
13.5%
e29562
13.2%
i26629
11.9%
a22355
10.0%
r22347
10.0%
v22347
10.0%
t22342
10.0%
l7892
 
3.5%
u6095
 
2.7%
n5429
 
2.4%
Other values (10)28470
12.7%

target
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.1%
Missing29033
Missing (%)90.6%
Memory size1.3 MiB
0.0
2787 
1.0
 
240

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters9081
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.02787
 
8.7%
1.0240
 
0.7%
(Missing)29033
90.6%
2021-11-28T16:49:07.521973image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
2021-11-28T16:49:07.605408image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
0.02787
92.1%
1.0240
 
7.9%

Most occurring characters

ValueCountFrequency (%)
05814
64.0%
.3027
33.3%
1240
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6054
66.7%
Other Punctuation3027
33.3%

Most frequent character per category

ValueCountFrequency (%)
05814
96.0%
1240
 
4.0%
ValueCountFrequency (%)
.3027
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common9081
100.0%

Most frequent character per script

ValueCountFrequency (%)
05814
64.0%
.3027
33.3%
1240
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII9081
100.0%

Most frequent character per block

ValueCountFrequency (%)
05814
64.0%
.3027
33.3%
1240
 
2.6%

Interactions

2021-11-28T16:48:39.311508image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:39.435884image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:39.548720image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:39.659211image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:39.771903image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:39.902530image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:40.034750image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:40.173263image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:40.311038image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:40.463950image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:40.606580image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:40.756941image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:40.905972image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:41.060745image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:41.197569image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:41.339732image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:41.607826image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:41.735910image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:41.872955image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:42.009409image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:42.160039image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:42.306823image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:42.459073image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:42.611108image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:42.761112image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:42.905233image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:43.049248image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:43.197692image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:43.344119image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:43.493249image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:43.636933image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:43.794148image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:43.944728image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:44.096989image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:44.228586image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:44.366404image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:44.506798image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:44.633709image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:44.768933image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:44.899102image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:45.035316image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:45.164868image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:45.304481image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:45.439858image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:45.719090image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:45.839949image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:45.965938image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:46.099900image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:46.227009image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:46.361439image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:46.489275image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:46.620670image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:46.750268image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:46.893589image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:47.026798image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:47.165165image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:47.311526image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:47.459904image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:47.610566image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:47.749742image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:47.885535image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:48.026369image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:48.169373image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:48.308774image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:48.458591image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:48.598884image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:48.748399image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:48.886711image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:49.026574image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:49.165975image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:49.297470image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:49.427494image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:49.565413image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:49.700127image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:49.832711image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:49.975512image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:50.113054image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:50.255174image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:50.394403image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:50.703252image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:50.827234image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:50.944052image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:51.070751image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:51.204874image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:51.337707image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:51.469942image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:51.613029image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:51.752902image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:51.900120image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:52.034012image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:52.170590image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:52.310066image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:52.436494image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:52.561445image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:52.695792image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:52.825725image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:52.958965image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:53.096944image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:53.228960image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:53.367501image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:53.523329image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:53.675916image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:53.828660image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:53.971277image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:54.112795image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:54.258579image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:54.401237image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:54.548464image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:54.694989image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:54.844124image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:54.992198image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:55.133911image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:55.280253image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:55.428011image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:55.561780image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:55.697422image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:55.842267image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:55.976725image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:56.117388image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:56.254191image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:56.402265image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:56.544752image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:56.898356image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:57.024474image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:57.157638image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:57.286306image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:57.419910image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:57.565790image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:57.705280image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:57.849250image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:57.984977image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2021-11-28T16:48:58.135609image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Correlations

2021-11-28T16:49:07.692119image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-28T16:49:07.908507image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-28T16:49:08.117289image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-28T16:49:08.347131image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-11-28T16:49:08.600291image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-11-28T16:48:58.490175image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-28T16:48:59.270892image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-11-28T16:48:59.547253image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

agemarital_statuseducationoccupation_leveleducation_numfamiliarity_FBview_FBinterested_insurancecreated_accounthas_marriededucation_orderjob_title_top_10company_email_addresshours_per_weekcapital_gaincapital_lossnative_countrydemographic_characteristictown_adjpaye_adjannual_salarysalary_band_text_adjtotal_months_with_employerworkclass_adjtarget
039Never-marriedBachelors117790No011Otherjones.com4021740United Kingdom77516EdinburghOther18109.0£ yearly246public0
150Married-civ-spouseBachelors417961No111OtherOther1300United Kingdom83311LeedsOther16945.0£ yearly337self-emp0
238DivorcedHS-grad1212541No09OtherOther4000United Kingdom215646EdinburghOther37908.0£. pw173private0
353Married-civ-spouse11th19920No17OtherOther4000United Kingdom234721EdinburghOther19087.0£ yearly390private0
428Married-civ-spouseBachelors1217891No111OtherOther4000Sweden338409SwindonBR44200032892.0£. per month42private0
537Married-civ-spouseMasters718750No112OtherOther4000United Kingdom284582OtherOther24336.0£. pw47private0
649Married-spouse-absent9th16120No15OtherOther1600Jamaica160187EdinburghOther16392.0£. per month30private0
752Married-civ-spouseHS-grad1312970No19OtherOther4500United Kingdom209642OtherOther37407.0£ yearly29self-emp0
831Never-marriedMasters1218631Yes012OtherOther50140840United Kingdom45781SwindonNW38400039744.0£. per month52private1
942Married-civ-spouseBachelors417531Yes111OtherOther4051780United Kingdom159449OtherOther16785.0£ yearly1private1

Last rows

agemarital_statuseducationoccupation_leveleducation_numfamiliarity_FBview_FBinterested_insurancecreated_accounthas_marriededucation_orderjob_title_top_10company_email_addresshours_per_weekcapital_gaincapital_lossnative_countrydemographic_characteristictown_adjpaye_adjannual_salarysalary_band_text_adjtotal_months_with_employerworkclass_adjtarget
3205049Married-civ-spouseHS-grad812811unknown19Conservator, museum/galleryOther4050130United Kingdom66385EdinburghOther17374400.0foreign_ccy120privateNone
3205122Never-marriedBachelors717360unknown011OtherOther3010550United Kingdom205940EdinburghOther17700.0£. per month17privateNone
3205251Married-civ-spouseSome-college6131071unknown110Event organiserOther5000United Kingdom260938SwindonNW38400022404.0£. per month140self-empNone
3205333Married-civ-spouseHS-grad1012680unknown19OtherOther4034110United Kingdom60567EdinburghOther26627.0£ yearly103privateNone
3205423Never-marriedBachelors817950unknown011OtherOther5000United Kingdom335067EdinburghOther25869.5£ - range25privateNone
3205534Never-marriedHS-grad412741unknown09OtherOther3000United Kingdom331126EdinburghOther19488.0£. per month38privateNone
3205653Divorced12th310871unknown08OtherOther4000United Kingdom156612EdinburghOther15116.0£ yearly167privateNone
3205744Married-civ-spouseBachelors617341unknown111Environmental health practitionerOther4500United Kingdom188436SwindonNW38400023733.0£ yearly163privateNone
3205860WidowedSome-college613621unknown110Event organiserOther4000United Kingdom227468EdinburghOther18617.0£ yearly107privateNone
3205955Married-civ-spouseSome-college813751unknown110OtherOther3800United Kingdom183580SwindonOther22185.0£ yearly247privateNone